11,089 research outputs found

    NetScore: Towards Universal Metrics for Large-scale Performance Analysis of Deep Neural Networks for Practical On-Device Edge Usage

    Full text link
    Much of the focus in the design of deep neural networks has been on improving accuracy, leading to more powerful yet highly complex network architectures that are difficult to deploy in practical scenarios, particularly on edge devices such as mobile and other consumer devices given their high computational and memory requirements. As a result, there has been a recent interest in the design of quantitative metrics for evaluating deep neural networks that accounts for more than just model accuracy as the sole indicator of network performance. In this study, we continue the conversation towards universal metrics for evaluating the performance of deep neural networks for practical on-device edge usage. In particular, we propose a new balanced metric called NetScore, which is designed specifically to provide a quantitative assessment of the balance between accuracy, computational complexity, and network architecture complexity of a deep neural network, which is important for on-device edge operation. In what is one of the largest comparative analysis between deep neural networks in literature, the NetScore metric, the top-1 accuracy metric, and the popular information density metric were compared across a diverse set of 60 different deep convolutional neural networks for image classification on the ImageNet Large Scale Visual Recognition Challenge (ILSVRC 2012) dataset. The evaluation results across these three metrics for this diverse set of networks are presented in this study to act as a reference guide for practitioners in the field. The proposed NetScore metric, along with the other tested metrics, are by no means perfect, but the hope is to push the conversation towards better universal metrics for evaluating deep neural networks for use in practical on-device edge scenarios to help guide practitioners in model design for such scenarios.Comment: 9 page

    Tracing Forum Posts to MOOC Content using Topic Analysis

    Full text link
    Massive Open Online Courses are educational programs that are open and accessible to a large number of people through the internet. To facilitate learning, MOOC discussion forums exist where students and instructors communicate questions, answers, and thoughts related to the course. The primary objective of this paper is to investigate tracing discussion forum posts back to course lecture videos and readings using topic analysis. We utilize both unsupervised and supervised variants of Latent Dirichlet Allocation (LDA) to extract topics from course material and classify forum posts. We validate our approach on posts bootstrapped from five Coursera courses and determine that topic models can be used to map student discussion posts back to the underlying course lecture or reading. Labeled LDA outperforms unsupervised Hierarchical Dirichlet Process LDA and base LDA for our traceability task. This research is useful as it provides an automated approach for clustering student discussions by course material, enabling instructors to quickly evaluate student misunderstanding of content and clarify materials accordingly.Comment: 6 pages, 4 figures, Course project for UofA CMPUT 660, Winter 201

    Resolution- and throughput-enhanced spectroscopy using high-throughput computational slit

    Full text link
    There exists a fundamental tradeoff between spectral resolution and the efficiency or throughput for all optical spectrometers. The primary factors affecting the spectral resolution and throughput of an optical spectrometer are the size of the entrance aperture and the optical power of the focusing element. Thus far collective optimization of the above mentioned has proven difficult. Here, we introduce the concept of high-throughput computational slits (HTCS), a numerical technique for improving both the effective spectral resolution and efficiency of a spectrometer. The proposed HTCS approach was experimentally validated using an optical spectrometer configured with a 200 um entrance aperture, test, and a 50 um entrance aperture, control, demonstrating improvements in spectral resolution of the spectrum by ~ 50% over the control spectral resolution and improvements in efficiency of > 2 times over the efficiency of the largest entrance aperture used in the study while producing highly accurate spectra.Comment: 11 pages, 2 figure

    Implications of Computer Vision Driven Assistive Technologies Towards Individuals with Visual Impairment

    Full text link
    Computer vision based technology is becoming ubiquitous in society. One application area that has seen an increase in computer vision is assistive technologies, specifically for those with visual impairment. Research has shown the ability of computer vision models to achieve tasks such provide scene captions, detect objects and recognize faces. Although assisting individuals with visual impairment with these tasks increases their independence and autonomy, concerns over bias, privacy and potential usefulness arise. This paper addresses the positive and negative implications computer vision based assistive technologies have on individuals with visual impairment, as well as considerations for computer vision researchers and developers in order to mitigate the amount of negative implications

    Enabling Computer Vision Driven Assistive Devices for the Visually Impaired via Micro-architecture Design Exploration

    Full text link
    Recent improvements in object detection have shown potential to aid in tasks where previous solutions were not able to achieve. A particular area is assistive devices for individuals with visual impairment. While state-of-the-art deep neural networks have been shown to achieve superior object detection performance, their high computational and memory requirements make them cost prohibitive for on-device operation. Alternatively, cloud-based operation leads to privacy concerns, both not attractive to potential users. To address these challenges, this study investigates creating an efficient object detection network specifically for OLIV, an AI-powered assistant for object localization for the visually impaired, via micro-architecture design exploration. In particular, we formulate the problem of finding an optimal network micro-architecture as an numerical optimization problem, where we find the set of hyperparameters controlling the MobileNetV2-SSD network micro-architecture that maximizes a modified NetScore objective function for the MSCOCO-OLIV dataset of indoor objects. Experimental results show that such a micro-architecture design exploration strategy leads to a compact deep neural network with a balanced trade-off between accuracy, size, and speed, making it well-suited for enabling on-device computer vision driven assistive devices for the visually impaired

    COVID-Net: A Tailored Deep Convolutional Neural Network Design for Detection of COVID-19 Cases from Chest X-Ray Images

    Full text link
    The COVID-19 pandemic continues to have a devastating effect on the health and well-being of the global population. A critical step in the fight against COVID-19 is effective screening of infected patients, with one of the key screening approaches being radiology examination using chest radiography. Motivated by this and inspired by the open source efforts of the research community, in this study we introduce COVID-Net, a deep convolutional neural network design tailored for the detection of COVID-19 cases from chest X-ray (CXR) images that is open source and available to the general public. To the best of the authors' knowledge, COVID-Net is one of the first open source network designs for COVID-19 detection from CXR images at the time of initial release. We also introduce COVIDx, an open access benchmark dataset that we generated comprising of 13,975 CXR images across 13,870 patient patient cases, with the largest number of publicly available COVID-19 positive cases to the best of the authors' knowledge. Furthermore, we investigate how COVID-Net makes predictions using an explainability method in an attempt to not only gain deeper insights into critical factors associated with COVID cases, which can aid clinicians in improved screening, but also audit COVID-Net in a responsible and transparent manner to validate that it is making decisions based on relevant information from the CXR images. By no means a production-ready solution, the hope is that the open access COVID-Net, along with the description on constructing the open source COVIDx dataset, will be leveraged and build upon by both researchers and citizen data scientists alike to accelerate the development of highly accurate yet practical deep learning solutions for detecting COVID-19 cases and accelerate treatment of those who need it the most.Comment: 12 page

    Affine Variational Autoencoders: An Efficient Approach for Improving Generalization and Robustness to Distribution Shift

    Full text link
    In this study, we propose the Affine Variational Autoencoder (AVAE), a variant of Variational Autoencoder (VAE) designed to improve robustness by overcoming the inability of VAEs to generalize to distributional shifts in the form of affine perturbations. By optimizing an affine transform to maximize ELBO, the proposed AVAE transforms an input to the training distribution without the need to increase model complexity to model the full distribution of affine transforms. In addition, we introduce a training procedure to create an efficient model by learning a subset of the training distribution, and using the AVAE to improve generalization and robustness to distributional shift at test time. Experiments on affine perturbations demonstrate that the proposed AVAE significantly improves generalization and robustness to distributional shift in the form of affine perturbations without an increase in model complexity.Comment: 6 page

    Seeing Convolution Through the Eyes of Finite Transformation Semigroup Theory: An Abstract Algebraic Interpretation of Convolutional Neural Networks

    Full text link
    Researchers are actively trying to gain better insights into the representational properties of convolutional neural networks for guiding better network designs and for interpreting a network's computational nature. Gaining such insights can be an arduous task due to the number of parameters in a network and the complexity of a network's architecture. Current approaches of neural network interpretation include Bayesian probabilistic interpretations and information theoretic interpretations. In this study, we take a different approach to studying convolutional neural networks by proposing an abstract algebraic interpretation using finite transformation semigroup theory. Specifically, convolutional layers are broken up and mapped to a finite space. The state space of the proposed finite transformation semigroup is then defined as a single element within the convolutional layer, with the acting elements defined by surrounding state elements combined with convolution kernel elements. Generators of the finite transformation semigroup are defined to complete the interpretation. We leverage this approach to analyze the basic properties of the resulting finite transformation semigroup to gain insights on the representational properties of convolutional neural networks, including insights into quantized network representation. Such a finite transformation semigroup interpretation can also enable better understanding outside of the confines of fixed lattice data structures, thus useful for handling data that lie on irregular lattices. Furthermore, the proposed abstract algebraic interpretation is shown to be viable for interpreting convolutional operations within a variety of convolutional neural network architectures.Comment: 9 page

    PolyNeuron: Automatic Neuron Discovery via Learned Polyharmonic Spline Activations

    Full text link
    Automated deep neural network architecture design has received a significant amount of recent attention. However, this attention has not been equally shared by one of the fundamental building blocks of a deep neural network, the neurons. In this study, we propose PolyNeuron, a novel automatic neuron discovery approach based on learned polyharmonic spline activations. More specifically, PolyNeuron revolves around learning polyharmonic splines, characterized by a set of control points, that represent the activation functions of the neurons in a deep neural network. A relaxed variant of PolyNeuron, which we term PolyNeuron-R, loosens the constraints imposed by PolyNeuron to reduce the computational complexity for discovering the neuron activation functions in an automated manner. Experiments show both PolyNeuron and PolyNeuron-R lead to networks that have improved or comparable performance on multiple network architectures (LeNet-5 and ResNet-20) using different datasets (MNIST and CIFAR10). As such, automatic neuron discovery approaches such as PolyNeuron is a worthy direction to explore.Comment: 5 page

    Auditing ImageNet: Towards a Model-driven Framework for Annotating Demographic Attributes of Large-Scale Image Datasets

    Full text link
    The ImageNet dataset ushered in a flood of academic and industry interest in deep learning for computer vision applications. Despite its significant impact, there has not been a comprehensive investigation into the demographic attributes of images contained within the dataset. Such a study could lead to new insights on inherent biases within ImageNet, particularly important given it is frequently used to pretrain models for a wide variety of computer vision tasks. In this work, we introduce a model-driven framework for the automatic annotation of apparent age and gender attributes in large-scale image datasets. Using this framework, we conduct the first demographic audit of the 2012 ImageNet Large Scale Visual Recognition Challenge (ILSVRC) subset of ImageNet and the "person" hierarchical category of ImageNet. We find that 41.62% of faces in ILSVRC appear as female, 1.71% appear as individuals above the age of 60, and males aged 15 to 29 account for the largest subgroup with 27.11%. We note that the presented model-driven framework is not fair for all intersectional groups, so annotation are subject to bias. We present this work as the starting point for future development of unbiased annotation models and for the study of downstream effects of imbalances in the demographics of ImageNet. Code and annotations are available at: http://bit.ly/ImageNetDemoAuditComment: To appear in the Workshop on Fairness Accountability Transparency and Ethics in Computer Vision (FATE CV) at CVPR 201
    • …
    corecore